Some recap of previous courses
Sufficient Statistics
We define a sufficient statistic as a statistic that conveys exactly the same information about the parameter as the entire data.
Fisher-Neyman Factorization Theorem: T(x) is a sufficient statistic for the parameter θ in the parametric model p(x∣θ) if and only if p(x∣θ)=h(x)gθ(T(x)) for some functions h(x) (does not depend on θ) and gθ(T(x)).
Exponential Families: p(x∣η)=h(x)exp(ηTT(x)−A(η))
- T(x) is a sufficient statistic
- η is the natural parameter
- A(η) is the log-partition function
- h(x) is the carrying measuremn
Decision Theory
Estimating p(x,c) from training data is an example of inference.
We define a decision problem has a discriminant rules where is to find regions Rj=minj∑kLkjp(Ck∣x) or Rj={x:∑kLkjp(Ck∣x)<∑kLkip(Ck∣x),∀i=j}
For regression problems, we want to find E[L]=∫∫L(y(x),t)p(x,t)dxdt where L(y(x),t) is the loss function. The least squares loss L(y(x),t)=(y(x)−t)2 leads to the equation E[L]=∫∫(y(x)−E[t∣x])2p(x,t)dxdt+∫∫(E[t∣x]−t)2p(x,t)dxdt. That is, only the first term based on y(x) so that we have y(x)=E[t∣x].
- The second term is the variance of t∣x
Multivariate Gaussian
Let μ∈Rm and Σ symmetric positive definite matrix m×m matrix. We write X∼Nm(μ,Σ) if pdf of X is given by f(x)=(2π)m/2∣Σ∣1/21exp(−21(x−μ)TΣ−1(x−μ)). By multivariate Gaussian properties, we have some useful properties:
- each marginal Xi is Gaussian with mean μi and variance σi2=Σii.
- the conditional distribution of Xj given Xi is Gaussian with mean μj+ΣjiΣii−1(xi−μi) and variance Σjj−ΣjiΣii−1Σij.
- Xi and Xj are independent if and only if Σij=0 for all i=j.
- Xi⊥Xj∣Xk⟺Σij=ΣikΣkk−1Σkj⟺Σij−1=0.
- we have Y=AX+b where A∈Rm×n and b∈Rm is a vector. Then Y∼Nn(Aμ+b,AΣAT).
Bayesian Inference
We always use bayesian inference for the latent variable model p(x,z)=p(z)p(x∣z), where:
- x are the observations or data
- z are the unobserved or latent variables
- p(z) is the prior distribution of z
- p(x∣z) is the likelihood function of x given z
- p(z∣x) is the posterior distribution of z given x or the conditional distribution of unobserved variables given the observed data. More generally, we have p(z∣x)=p(x)p(x∣z)p(z) where p(x)=∫p(x∣z)p(z)dz is the marginal likelihood.